Speech analytics

Speech analytics is a term used to describe automatic methods of analyzing speech to extract useful information about the speech content. Although it often includes elements of automatic speech recognition, where the identities of spoken words or phrases are determined, it may also include analysis of one or more of the following:

One use of speech analytics applications is to spot spoken keywords or phrases, either as real-time alerts on live audio or as a post-processing step on recorded speech. This technique is also known as audio mining. Other uses include categorization of speech, for example in the contact center environment, to identify calls from unsatisfied customers.

Speech Analytics in contact centers can be used to extract critical business intelligence that would otherwise be lost. By analyzing and categorizing recorded phone conversations between companies and their customers, useful information can be discovered relating to strategy, product, process, operational issues and contact center agent performance. This information gives decision-makers insight into what customers really think about their company so that they can quickly react. In addition, Speech Analytics can automatically identify areas in which contact center agents may need additional training or coaching, and can automatically monitor the customer service provided on calls.

Contents

Technology

There are three main approaches "under the hood": the phonetic approach; large-vocabulary continuous speech recognition (LVCSR, better known as speech-to-text or full transcription), and direct phrase recognition.

Some speech analytics vendors use the "engine" of a 3rd party and there are some speech analytics vendors that have developed their own proprietary engine.

Phonetic

This is the fastest approach for processing, mostly because the size of the grammar is very small. The basic recognition unit is a phoneme. There are only few tens of unique phonemes in most languages, and the output of this recognition is a stream (text) of phonemes, which can then be searched.

LVCSR (large-vocabulary continuous speech recognition)

Much slower processing, since the basic unit is a set of words (bi-grams,tri-grams etc), it needs to have hundreds of thousands of words to match the audio against. The output however is a stream of words, making it richer to work with. It can surface new business issues, the queries are much faster, and the accuracy is higher than the phonetic approach. Most importantly because the complete semantic context is in the index it is possible to find and focus on business issues very rapidly.

Direct Phrase Recognition

Rather than first converting speech into phonemes or text, this approach directly analyzes speech, looking for specific phrases that have been pre-defined as being important to the business. Because no data is lost in conversion using this approach, the results of this method generally provide the highest data reliability.

Data Reliability

According to the US Government Accountability Office[1], “data reliability refers to the accuracy and completeness of computer-processed data, given the uses they are intended for.” In the realm of Speech Recognition and Analytics, “completeness” is measured by the “detection rate”, and usually as accuracy goes up, the detection rate goes down.

Accuracy

According to the American Heritage® Dictionary of the English Language, accuracy is defined as: “The ability of a measurement to match the actual value of the quantity being measured.” In the realm of Speech Recognition and Analytics, accuracy refers to the portion of results that were correctly recognized within a given result set.

The concept of accuracy can be illustrated in laymen’s terms by the following example. An executive in San Francisco needs to attend a meeting in New York. The executive calls the company travel agent, and asks the travel agent to e-mail him a list of all the flights from San Francisco to New York. The travel agent e-mails the executive a list of ten flights, but one of the flights is from Oakland to Newark, while the rest of the flights are from San Francisco to New York as the executive requested. Nine out of ten is 90% accuracy.

Completeness (Detection Rate)

Completeness is defined as: “The state of being complete and entire; having everything that is needed.” In the realm of Speech Recognition and Analytics, completeness is measured by the detection rate, and detection rate refers to the portion of occurrences of a given event or word found by the system compared to the actual number of occurrences.

To continue with our previous travel agency example, imagine that after the executive receives the list of flights from the travel agent, the executive sits down at his computer, goes to an online travel website, and searches for flights from San Francisco to New York. His search returns 18 flights from San Francisco to New York. The list from the travel agent had only 9 of those flights. The travel agent had only found nine out of the 18 flights from San Francisco to New York—a 50% detection rate.

Needless to say, data reliability is critical to sound decision making. However, most Speech Analytics vendors neglect to mention their detection rates when discussing the reliability of their data, instead quoting their accuracy rates, even though their detection rates go down significantly as their accuracy goes up.

Business value

Competitive advantage often depends on anticipating market needs faster and more visibly than your competitors. And nothing can tell you more about your business than the voice of your customers. Speech Analytics provides advanced functionality that can help you glean valuable intelligence from thousands—even millions—of customer calls, so you can take action quickly.

Although your contact center records customer conversations, the sheer number of recordings can easily exceed your ability to review and analyze them. Speech Analytics solutions can mine recorded customer interactions to surface the intelligence essential for building effective cost containment and customer service strategies. Used in combination with other workforce optimization suite components like quality monitoring and agent scorecards, Speech Analytics can help you pinpoint cost drivers, trends, and opportunities, identify strengths and weaknesses with processes and products, and understand how your offerings are perceived by the marketplace. With Speech Analytics, you can turn captured interactions into actionable intelligence for your entire enterprise.

Speech Analytics is designed with the business user in mind. Usually it can provide automated trend analysis to show what’s happening in your contact center. The solution can isolate the words and phrases used most frequently within a given time period, as well as indicate whether usage is trending up or down. This information makes it easy for supervisors, analysts, and others in your organization to spot changes in consumer behavior and take action to reduce call volumes—and increase customer satisfaction.

See also

  1. ^ "Assessing the Reliability of Computer-Processed Data". Assessing the Reliability of Computer-Processed Data. United States General Accounting Office. http://www.gao.gov/new.items/d03273g.pdf.